Fix LLMM1 kernel #28

fxmarty · 2024-05-31T16:03:25Z

As per title.

The kernel is broken and yields wrong generation, for example with TP=2 & llama 2 7B. There are two main issues in the kernel:

Number of thread in a block is not a multiple of the warp size, although we use warp shuffle operations.
Indexing on the block of rows is wrong & indexing to load the weight matrix is wrong.

Example repro:

import torch
from transformers import AutoModelForCausalLM
# from gemv_ext import LLMM1
from vllm import _custom_C

m = 4096
n = 1

for k in range(200, 1600, 8):
    with torch.no_grad():

        down_proj_weight = torch.rand(m, k, device="cuda", dtype=torch.float16)
        inp = torch.rand(n, k, device="cuda", dtype=torch.float16) - 0.5

        out_functional = torch.nn.functional.linear(inp, down_proj_weight)

        out_custom = torch.empty(n, m, dtype=inp.dtype, device="cuda")
        _custom_C.LLMM1(down_proj_weight, inp, out_custom, 4)

        absdiff = (out_functional.to(torch.float32) - out_custom.to(torch.float32)).abs()

        print("------ k:", k)
        print("  % off:", absdiff.mean().item() / out_functional.abs().mean().item())

dllehr-amd

@fxmarty I appreciate the PR! We were mainly focused on TP=8 and TP=1. These changes look good to me.

fxmarty · 2024-06-12T12:37:50Z

Thank you, I fixed the linting conflict.

shajrawi · 2024-06-13T14:00:35Z

Still have failing CI

Run codespell --toml pyproject.toml
./csrc/custom/custom_kernels.cu:88: accross ==> across
Error: Process completed with exit code 65.

fxmarty · 2024-06-14T09:55:30Z

@shajrawi thank you, should be fixed

shajrawi · 2024-06-14T14:37:54Z

Merged! Thanks for the fix!!

fix gemv kernel

f91b214

gshtras requested a review from dllehr-amd May 31, 2024 16:59

amathews-amd requested a review from shajrawi June 4, 2024 21:09

fxmarty mentioned this pull request Jun 6, 2024

ROCm and sliding windows fixes huggingface/text-generation-inference#2033

Merged

add guard for A load

c10263b

dllehr-amd approved these changes Jun 11, 2024

View reviewed changes

fxmarty added 2 commits June 12, 2024 14:30

Merge branch 'main_rocm_original' into fix-llmm1-kernel

72a8dbf

linting

aa58624

fix typo

54fcf12

shajrawi merged commit d3da246 into ROCm:main Jun 14, 2024

mawong-amd mentioned this pull request Jun 19, 2024

Refactor custom gemm heuristics #56

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix LLMM1 kernel #28

Fix LLMM1 kernel #28

fxmarty commented May 31, 2024 •

edited

Loading

dllehr-amd left a comment

fxmarty commented Jun 12, 2024

shajrawi commented Jun 13, 2024

fxmarty commented Jun 14, 2024

shajrawi commented Jun 14, 2024

Fix LLMM1 kernel #28

Fix LLMM1 kernel #28

Conversation

fxmarty commented May 31, 2024 • edited Loading

dllehr-amd left a comment

Choose a reason for hiding this comment

fxmarty commented Jun 12, 2024

shajrawi commented Jun 13, 2024

fxmarty commented Jun 14, 2024

shajrawi commented Jun 14, 2024

fxmarty commented May 31, 2024 •

edited

Loading